[TLDR] Choosing a Model, Shaping a Future: Comparing LLM Perspectives on Sustainability and its Relationship with AI
How different LLMs conceptualize sustainability and the implications for AI-driven decision-making
Authors: A. Bush et al. Published on Arxiv: 2025-05-20 Link: http://arxiv.org/abs/2505.14435v1 Institutions: Technical University Dortmund, Germany • University Alliance Ruhr, Germany • University of Duisburg-Essen, Germany Keywords: Large Language Models, LLMs, Sustainability, Artificial Intelligence, AI, GPT, Claude, LLaMA, Mistral, DeepSeek, Sustainable Development Goals, SDG, Twin transition, Bias, Psychometric assessment, Technology governance
Organizations are increasingly relying on AI systems—especially Large Language Models (LLMs)—to inform their sustainability-related decisions. As these models reflect societal values and biases from their training data, their outputs can significantly influence sustainability strategies. Understanding how different LLMs conceptualize sustainability and AI’s role within it is thus crucial for organizations.
Linking this context to the methodological core of the study, the authors conducted a systematic evaluation employing validated assessment tools to probe LLM perspectives:
- Five state-of-the-art LLMs (Claude, DeepSeek, GPT, LLaMA, Mistral) were evaluated on their views regarding sustainability and the AI-sustainability relationship.
- Validated psychometric questionnaires (AI-SDG17, AISPI) and custom items ensured comprehensive insight, with each model assessed 100 times per questionnaire for statistical robustness.
- Standardized prompts and response formats facilitated comparability, and nonparametric multiple contrast testing procedures (MCTPs) enabled rigorous statistical analysis.
Building from this methodological approach, the results demonstrate clear, model-dependent divergences in viewpoints:
- All models regarded AI’s impact on “Reducing Inequalities” (SDG 10) as lowest; LLaMA was notably optimistic about various SDGs, while Mistral was most conservative.
- GPT appeared skeptical about AI-sustainability compatibility; LLaMA was strongly techno-optimistic.
- Significant and systematic differences were found between models across twin transition and competing interests measures.
- While all models prioritized sustainability over AI, optimism regarding AI-sustainability integration varied widely—LLaMA most, Mistral least.
- Attribution of responsibility differed: GPT and LLaMA included all institutions, DeepSeek was more selective; highest trust was generally placed in technology companies and governments, though LLaMA and Mistral trusted NGOs and research organizations more.
These findings lead to important conclusions and future suggestions:
- Pronounced, model-specific biases shape LLM perspectives on sustainability, from techno-optimism to techno-skepticism.
- Model selection can significantly skew organizations’ sustainability recommendations and strategies.
- Exclusive reliance on a single LLM carries risks of polarized or biased outcomes in sustainability tasks.
- Transparency in documenting LLM perspectives and employing ensemble approaches is recommended for better-balanced decision support.
- Further studies should examine sources of these perspectival differences, refine evaluation instruments, and assess the impacts of language and cultural context.